GH-45788: [C++][Acero] Fix data race in aggregate node #45789

zanmato1984 · 2025-03-14T12:02:51Z

Rationale for this change

Data race described in #45788 .

What changes are included in this PR?

Put the racing member segmenter_values in thread local state.

Are these changes tested?

Yes. UT added.

Are there any user-facing changes?

None.

GitHub Issue: [C++][Acero] Data race in aggregate node #45788

github-actions · 2025-03-14T12:03:18Z

⚠️ GitHub issue #45788 has been automatically assigned in GitHub to PR creator.

zanmato1984 · 2025-03-14T12:05:29Z

Hi @pitrou , would you like to take a look? Thanks.

kou · 2025-03-15T04:59:09Z

@github-actions crossbow submit emscripten

github-actions · 2025-03-15T05:01:19Z

Revision: 9d8ddbb

Submitted crossbow builds: ursacomputing/crossbow @ actions-73916ce3ac

Task	Status
test-conda-python-emscripten
test-ubuntu-22.04-cpp-emscripten

mapleFU

Can we also makes Segmenter::GetSegments a const fn? (Or I can do it in a separate patch )

zanmato1984 · 2025-03-17T05:49:06Z

Can we also makes Segmenter::GetSegments a const fn? (Or I can do it in a separate patch )

I guess not. Both SimpleKeySegmenter::GetSegments and AnyKeysSegmenter::GetSegments store intermediate state (that is used for getting the segments from the subsequent batch):

arrow/cpp/src/arrow/compute/row/grouper.cc

Line 163 in 7e18764

SaveKeyData(key_data);

arrow/cpp/src/arrow/compute/row/grouper.cc

Line 273 in 7e18764

save_group_id_ = group_ids[batch.length - 1];

pitrou · 2025-03-17T16:15:16Z

cpp/src/arrow/acero/groupby_aggregate_node.cc

@@ -312,7 +312,7 @@ Result<ExecBatch> GroupByNode::Finalize() {
                         segment_key_field_ids_.size());

  // Segment keys come first
-  PlaceFields(out_data, 0, segmenter_values_);
+  PlaceFields(out_data, 0, state->segmenter_values);


So, the Finalize step only considers the segmenter values for state[0]? I'm not sure I understand why.

See [1], the Finalize is following a Merge call, by when the states of all threads have been merged into state[0].

But this raises an interesting question that I only realized now: the proposed thread-local segmenter_valuess are never merged, how dare I only append segmenter_values in state[0]? This seems wrong but turns out to be OK because of some implicit assumptions quite far away:

If the segmenter_values_ is not empty, then at least one segment key is specified, then we are executing single-threaded and only "state[0]" exists. This is detailed in my answer to the other comment.

Else, then there are no segment keys and we are appending nothing.

Even though the fix actually works, it "feels" more weird than before - a shared segmenter_values_ seems to be, at least conceptually, more reasonable (though it has race and isn't really correct in the context of multi-threading). This makes me hesitate about the current fix. I think it all boils down to the "abstraction leak" in the original design:

The segmenter abstraction seems to be designed to be independent of the assumption that "no multi-threading for segmented cases" (and honestly I am personally quite fond of this design).

The existence of segmenter_values_ strongly depends on the assumption of single-threaded.

Now I will try to fix the race in another way more independent of this abstraction leak, leaving the later to be addressed in the future.

[1]

arrow/cpp/src/arrow/acero/groupby_aggregate_node.cc

Lines 353 to 354 in fc0862a

RETURN_NOT_OK(Merge());

ARROW_ASSIGN_OR_RAISE(out_data_, Finalize());

Yes, that sounds reasonable. If we wanted to have multi-threaded segmented group-by, I suppose it would need a preparatory step to rechunk the input along segment boundaries?

It can be the case, but may not be that useful. Because IMO part of the point of segmented aggregation is to emit the result in a streaming fashion, that is, as soon as a segment is concluded to be "close", we are sure that the current aggregation result is a valid partial result thus can be output to downstream. Rechunking the input would require all input batches to be accumulated already.

To multi-threading the segmented aggregation, I would imagine the batches to be already partitioned by (and sorted by, of course, this is already implied by the current single-threaded impl) segment keys and distributed to specific threads (# partition == # thread). This can be achieved by a special source node or a "shuffle" node.

This way each thread of the aggregate can process all rows belonging to a specific segment. This would require some modification to the current aggregate node such as not merging other thread states, or just have a brand new "partitioned aggregate" node.

Anyway, it's not trivial and can be quite restrictive to use.

BTW, I've redone the fix in a way that is less weird.

cpp/src/arrow/acero/scalar_aggregate_node.cc

Fix agg data race and add tests.

9d8ddbb

zanmato1984 requested a review from westonpace as a code owner March 14, 2025 12:02

github-actions bot added Component: C++ awaiting review Awaiting review labels Mar 14, 2025

mapleFU reviewed Mar 15, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 15, 2025

pitrou reviewed Mar 17, 2025

View reviewed changes

cpp/src/arrow/acero/scalar_aggregate_node.cc Outdated Show resolved Hide resolved

Re-fix

f2af2d7

zanmato1984 force-pushed the fix-agg-race branch from c984de6 to f2af2d7 Compare March 19, 2025 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-45788: [C++][Acero] Fix data race in aggregate node #45789

GH-45788: [C++][Acero] Fix data race in aggregate node #45789

zanmato1984 commented Mar 14, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 14, 2025

zanmato1984 commented Mar 14, 2025

kou commented Mar 15, 2025

github-actions bot commented Mar 15, 2025

mapleFU left a comment

zanmato1984 commented Mar 17, 2025

pitrou Mar 17, 2025 •

edited

Loading

zanmato1984 Mar 19, 2025

pitrou Mar 19, 2025

zanmato1984 Mar 19, 2025

zanmato1984 Mar 19, 2025

	RETURN_NOT_OK(Merge());
	ARROW_ASSIGN_OR_RAISE(out_data_, Finalize());

GH-45788: [C++][Acero] Fix data race in aggregate node #45789

Are you sure you want to change the base?

GH-45788: [C++][Acero] Fix data race in aggregate node #45789

Conversation

zanmato1984 commented Mar 14, 2025 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Mar 14, 2025

zanmato1984 commented Mar 14, 2025

kou commented Mar 15, 2025

github-actions bot commented Mar 15, 2025

mapleFU left a comment

Choose a reason for hiding this comment

zanmato1984 commented Mar 17, 2025

pitrou Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

zanmato1984 Mar 19, 2025

Choose a reason for hiding this comment

pitrou Mar 19, 2025

Choose a reason for hiding this comment

zanmato1984 Mar 19, 2025

Choose a reason for hiding this comment

zanmato1984 Mar 19, 2025

Choose a reason for hiding this comment

zanmato1984 commented Mar 14, 2025 •

edited by github-actions bot

Loading

pitrou Mar 17, 2025 •

edited

Loading